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Abstract 

Summary: Sequences are probably the nnost connnnon piece of infornnation in 
sites providing biological data resources, particularly those related to genes 
and proteins. Multiple visual representations of the sanne sequence can be 
found across those sites. This can lead to an inconsistency connpronnising both 
the user experience and usability while working with graphical representations 
of a sequence. Furthernnore, the code of the visualisation nnodule is connnnonly 
ennbedded and nnerged with the rest of the application, nnaking it difficult to 
reuse it in other applications. In this paper, we present a BioJS connponent for 
visualising sequences with a set of options supporting a flexible configuration of 
the visual representation, such as fornnats, colours, annotations, and colunnns, 
annong others. This connponent ainns to facilitate a connnnon representation 
across different sites, nnaking it easier for end users to nnove fronn one site to 
another. 

Availability: http://www.ebi.ac.uk/Tools/biojs; 
http://dx.doi.org/10.5281/zenodo.8299 
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Introduction 

Visualising biological data on the web is a common practice on 
sites providing bio-oriented services and resources. A wide variety 
of JavaScript libraries are being used to build pieces of software 
capable of representing bio-entities such as DNA sequences ^ pro- 
tein sequences (http://www.uniprot.org), protein structures (http:// 
www.wwpdb.org), ontology trees^, protein-protein interactions 
(http://www.ebi.ac.uk/intact/)\ and others. Therefore, a variety 
of possible visual representations for the same bio-entity can be 
found as a result of its multiple implementations. In many cases, 
such implementations are difficult to maintain, test, and reuse as 
they are developed only with one use case in mind. Furthermore, 
user experience (UX) and usability across different sites may be 
compromised. 

One particular type of data commonly affected by multiple repre- 
sentations is the sequence, either a DNA or protein sequence. A 
sequence is a common bio-entity present in most sites offering bio- 
logical data resources. Figure 1 shows different visual representa- 
tions of a protein sequence as it can be found in Uniprot (http:// 
www.uniprot.org), Dasty"^ (http://www.ebi.ac.uk/dasty) and Ensembl 
(http://www.ensembl.org), among others^'^. Multiple features are 
identified across the entire set of sequences. Features such as for- 
matting, indexing numbers, annotations, marks, colouring tags, and 
even the capability of user interaction are not integrated in one 
reusable piece of code. Instead, multiple representations prevail. 
Furthermore, web developers often make their own isolated efforts 
to reproduce those views for their sites and, in most cases, the rep- 
resentation is not identical, no documentation is available, and often 
they are not portable to other sites. 

In this paper, a reusable component to visualise sequences is pre- 
sented under the BioJS set of minimum standards for visualisation 
of biological components. BioJS is a community-driven standard 
to develop visualisation functionality^. The library is developed 
using well-established methodologies and object-oriented design 



with inheritance that facilitates rapid development, reuse, exten- 
sion, integration and deployment of web applications. 

The Sequence component 

Exploring sequence visualisation across different sites reveals a set 
of features that should be supported by a single, reusable, and well 
documented piece of code, capable of painting sequences on the 
web in a consistent manner. In this sense, BioJS provides a baseline 
for Javascript coding and development to create pieces of reusable 
code, called components. Creating a new Sequence component con- 
sists of extending a core BioJS class and defining three core con- 
cepts: options, methods and events. Options are the data required 
by the component for initialisation, while methods and events are 
actions supported in execution time. Methods are fired externally 
while events are triggered in the component and exposed to external 
listeners. 

Methods and events allow the component to communicate with oth- 
ers components as well as web applications. Figure 2 shows a work- 
ing example implemented within the Biotea project^ This example 
shows a communication between two component instances, the 
Sequence component and the ProteinSD component. When a region 
(highlighted in yellow) on the sequence is selected, automatically 
a selection action is fired in the ProteinSD. Additionally, Sequence 
supports a set of options to change the visual representation of the 
sequence by using different formats, colours, indexing numbers, 
annotations and more. It helps deployment because the component 
can be easily fitted to the particular need. Figure 3 shows an exam- 
ple of the Sequence component displaying the protein P9 18283 in 
CODATA format. 

As any other BioJS component, the Sequence component is well 
documented and has been tested during development, not only for 
functionality but also for usability. BioJS makes it easier to docu- 
ment the code by adding annotations that are later exposed as a web 
page. Thus, human-friendly documentation is generated without 
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Figure 1. Multiple representations compiled as one flexible BioJS component. 
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i: P77872, PDB Alignments: [ 2A9E.A (1..505) 



>P77872 505 bp 
MVNKDVKQTT AFGAPVWDDN 
TFTVTKDITK YTKAKIFSKV 
NNTPVFFIRD AIKFPDFIHT 
MDGFGSHTFS LINAKGERFW 
KWKLSIQVMP EEDAKKYRFH 
VPGIGYSPDR KLQGFLLFSYG 
PSSLPGYKED KSARDPKFNL 
GESLAHVTHK EIVDKQLEHF 



NVITAGPRGP 
GKKTECFFRF 
QKRDPQTNLP 
VKFHFHTMQG 
PFDVTKIWYL 
DTHRYTLLGVN 
AHIEKEFEVW 
KKADPKYAEG 



VliQSTWFLE 
STVAGERGSA 
NHDKVWDFWS 
VKHLTNEEAA 
QDYPLMEVGI 
YPQIPVNKPR 
NWDYRADDSD 
VKKALEKHQK 



KLAAFDRERI 
DAVRDPRGFA 
NVPESLYQVT 
EVRKYDPDSN 
VELNKNPENY 
CPFHSSSRDG 
YYTQPGDYYR 
MMKDMHGKDM 



PERWHAKGS 
MKYYTEEGNW 
WVMSDRGIPK 
QRDLFNAIAR 
FAEVEQAAFS 
YMQNGYYGSL 
SLPADEKERL 
HHTKKKK 







Figure 2. Example of communication between Sequence and ProteinSD components. 



Format: | codata 7] 

ENTRY P9ie23 3 

SIQ-JENCE 

5 10 15 20 25 30 

1 HETLCQR L NVCQDKILTEYENDSTDLRDH 

3€ IBMMM^MM^ ykaremGfee ih^hOvvftlavskneal 

71 QAIELQLTLETIYNSQYSNEKWTLQDVSLEVYLTA 
106 PTGCIKKHCYTVEVQFDGDICNTKHYTJJWTHIYIC 



141 


E 


E 


A 


D 


J 


S 


£ 


V 


1 


V 


V 


E 


G 


Q 


V 


D 


Y 


176 


K 


D 


D 


A 


E 


K Y 


S 


K 


li 


K 


V 


W 


E 


V 


H 


A 


G 


211 


S 


P 


E 


I 


I 


R Q 


E 


L 


A 


N 


H 


P 


A 


A 


T 


H 


T 


246 


R 


S 


E 


P 


D 


1 G 


ti 


P 


C 


H 


T 


T 


K 


L 


L 


H 


H 


231 


G 


R 


I 


N 


C 

1 


N S 




T 


1 


P 


I 

1 


V 


H 


L 


K 


G 


D 


316 


Y 


T 


A 


V 


S 


S T 




H 


•H 


T 


G 


H 


N 


V 


K 


H 


K 


351 


L 


S 


Q 


V 


K 


I P 


K 


r 


I 


T 


V 


S 


T 


G 


F 


K 


S 



A L G 1 E E 1 Q 1 1 
BSRSa P I L T a F N 



I Q R P 
S S H K 



■CLRYRFKKHCTL 



AIVTLTYDSE-HQRDOF 



/// 



Figure 3. Example displaying the sequence corresponding to the UniProt accession P918283. The part highlighted in yellow denotes the 
current selection, the black pop-up box indicates what the interval is with every move of the pointer. Green highlight denotes an annotation 
on that interval. Multiple annotations are supported. 



any additional effort. BioJS web pages for components are compiled 
in a registry that acts as a showcase of working examples extracted 
from the component annotations. The registry makes it easier for 
both developers and end users to understand components and their 
functionality. Once a component has met the BioJS guidelines, it 
becomes a candidate to be submitted and publicly shared in the 
common repository of components, the EBI BioJS registry (http:// 
www.ebi.ac.uk/Tools/biojs/registry/). There, it is possible to find 
more information about options, installation, methods, and events 
(http://www.ebi.ac.uk/Tools/biojs/registry/Biojs.Sequence.html). 

Future work 

Currently, the Sequence component supports the visualisation of a 
single strand. However, in some cases, it should be more interesting 
to display similarities between two or multiple sequences. Another 
possible extension is using this component as a base for multiple 
aligned sequences visualisation. Aligner algorithms'^ could be run 



on the server side or consumed from a web service while the 
component would be in charge of painting the similarities, taking 
advantage of already developed features such as colouring, high- 
lighting, and tagging. 

Collaborative work and social networking is nowadays a mecha- 
nism for knowledge construction. Such features can be integrated 
into the Sequence component so end users can submit sequences 
and annotations to public sequence databases such as UniProt. 
Comments and references could also be added, adding valuable 
information for a researcher during his/her investigation. 

Software availability 

Zenodo: Sequence BioJS component for visualising sequences, doi: 
10.5281/zenodo.8299i^ 

GitHuB: BioJS, http://www.ebi.ac.uk/Tools/biojs. 
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Jeremy Goecks 

Computational Biology Institute, George Washington University, Washington, DC, USA 

Approved: 17 March 2014 

Referee Report: 17 March 2014 
doi:1 0.5256/f 1 0OOresearch.371 7.r3802 

Here, the authors present Sequence, a web-based visualization component for biological sequence data 
implemented in JavaScript. Investigators can use Sequence to visualize both DNA and protein 
sequences, either as a standalone visualization or together with other visualizations. 

Strengths of Sequence include (a) the ability to customize sequence using options and (b) integration of 
sequence via events. These features ensure that Sequence can be used in a wide variety of applications. 

What is missing from this manuscript is a description of how well Sequence scales to large sequences 
and whether a Sequence visualization can be updated dynamically in response to events from other 
components. 

Overall, Sequence is a solid contribution to web-based visualization that is useful as it is and forms the 
foundation for more complex web-based sequence visualization in the future. 

I have read this submission. I believe that I have an appropriate level of expertise to confirm that 
it is of an acceptable scientific standard. 

Competing Interests: No competing interests were disclosed. 
Christoph Gille 

Computational Biochemistry Group, Institute of Biochemistry, University Medicine Berlin (Charite), Berlin, 
Germany 

Approved: 17 March 2014 

Referee Report: 17 March 2014 
doi:1 0.5256/f 1 0OOresearch.371 7.r3694 
General 

The authors present the first re-usable JavaScript based sequence component. It can be used in web 
applications dealing with bio-polymers like proteins and nucleotide sequences and can also interact with 
other parts of the website via events. 

Previously, Java applets have been used for interactive web content. However, Java constitutes an 
additional layer of software and thereby carries an own set of technical problems and risks. For this 
reason. 



Page 5 of 6 



FlOOOResearch 



FIOOOResearch 2014, 3:52 Last updated: 14 JUL 2014 



JavaScript is being increasingly used at the client side. In this respect, the development of BioJS 
components follows a general trend. 

The BioJS registry is the first and only framework plus standard for interactive web components, and 
Sequence will be one of the most important components following the BioJS specification. Therefore, I 
expect that the Sequence component will be widely used in bioinformatics web services. Even if current 
features might not satisfy all needs, the BioJS format allows for extensions and incorporation of new 
features with the source code clear and well documented, allowing developers to change it to their 
requirements. 

Manuscript 

I would suggest replacing the word "compiled" with another word (in the figure 1 legend and in the third 
paragraph of the Sequence component section) as it might be mistaken for source code getting compiled 
on a server like on the Debian Linux server. 

The manuscript does not provide answer to some important questions: 

1 . Is the length of the sequence limited? 

2. Is the sequence immutable? Or could it change like alternative splicing? Can parts of the 
sequences be hidden like cutting of signal peptide? 

3. "Indexing numbers" - does the numbering support PDB insertion codes? 
It would be good if these points could be clarified in the manuscript. 

Example 

For demonstration, the authors have coupled the sequence view with a BioJS 3D component. 
With the newest Java, the JMol applet fails to start with the message: "Your security system has blocked 
an untrusted I expect that the line Permissions: sandbox in the jar-file manifest and signing the jar-file 
will fix the problem. 

The authors should also consider using a JavaScript based 3D visualization. 
API 

On events like 'Annotation Clicked', there is no parameter indicating whether the context pop-up trigger 
(right click, long touch) is active and what modifier keys like Shift and Ctrl are pressed - this should be 
made clearer for ease of use. 

I have read this submission. I believe that I have an appropriate level of expertise to confirm that 
it is of an acceptable scientific standard. 

Competing Interests: No competing interests were disclosed. 
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